Performance optimisations of the NPB FT kernel by special-purpose unroller
نویسندگان
چکیده
The fast Fourier transform (FFT) is the cornerstone of many supercomputer applications and therefore needs careful performance tuning. Most often, however, the real performance of the FFT implementations is far below the acceptable figures. In this paper, we explore several strategies for performance optimisations of the FFT computation, such as enhancing instruction-level parallelism, loop merging, and reducing the memory loads and stores by using a special-purpose automatic loop unroller. Our approach is based on the principle of complete unrolling which we apply to modify the FT kernel of the NAS Parallel Benchmarks (NPB). In experiments on two different IBM SP2 platforms, our automatically generated unrolled FFT subroutine is shown to improve the performance between 40% and 53% in comparison with the original code. Further, the execution time of the entire 3-D FFT mega-step of the benchmark is faster than when calls to a similar FFT subroutine from the vendor-optimised PESSL numerical library are used. Preliminary results suggest that the completely unrolled code also outperforms FFTW, another high-performance FFT package. Finally, our approach for automatic generation of moderately optimised but specialised codes requires only a modest amount of programming effort.
منابع مشابه
Performance Optimisations of the Npb Ft Kernel by Special-purpose Unroller
The fast Fourier transform (FFT) is the cornerstone of many supercomputer applications and therefore needs careful performance tuning. Most often, however, the real performance of the FFT implementations is far below the acceptable gures. In this paper, we explore several strategies for performance optimisations of the FFT computation , such as enhancing instruction-level parallelism, loop merg...
متن کاملسنتز و فرمولاسیون سیستم بایندر- نرمکننده NHTPB-NPB و بررسی خواص عملکردی آن در PBXN-109 اصلاح شده
In this work, nitration of low molecular weight polybutadiene (PB) by a convenient and inexpensive procedure has been investigated. The product (Nitropolybutadiene (NPB) energetic plasticizer) was characterized by FT-IR, 1H-NMR, GPC, TGA, DSC etc. Then NPB energetic polymer plasticizer and nitro-hydroxyl terminated polybutadiene (NHTPB) binder have been replaced with dioctyladiphate (DOA) inert...
متن کاملPerformance assessment of parallel techniques
The goal of this work is to evaluate and compare the computational performance of the most common parallel libraries such as Message Passing Interface (MPI), High Performance Fortran (HPF), OpenMP and DVM for further implementations. Evaluation is based on NAS Parallel benchmark suite (NPB) which includes simulated applications BT, SP, LU and kernel benchmarks FT, CG and MG. A brief introductio...
متن کاملUser-Level VSM Optimization and its Application
This paper describes user-level optimisations for virtual shared memory (VSM) systems and demonstrates performance improvements for three scientiic kernel codes written in Fortran-S and running on a 30 node prototype distributed memory architecture. These optimisations can be applied to all consistency models and directory schemes, whether in hardware or software, which employ an invalidation b...
متن کاملNeural Network-Based Learning Kernel for Automatic Segmentation of Multiple Sclerosis Lesions on Magnetic Resonance Images
Background: Multiple Sclerosis (MS) is a degenerative disease of central nervous system. MS patients have some dead tissues in their brains called MS lesions. MRI is an imaging technique sensitive to soft tissues such as brain that shows MS lesions as hyper-intense or hypo-intense signals. Since manual segmentation of these lesions is a laborious and time consuming task, automatic segmentation ...
متن کامل